44 research outputs found

    Context dependent substitution biases vary within the human genome

    Get PDF
    Background: Models of sequence evolution typically assume that different nucleotide positions evolve independently. This assumption is widely appreciated to be an over-simplification. The best known violations involve biases due to adjacent nucleotides. There have also been suggestions that biases exist at larger scales, however this possibility has not been systematically explored. Results: To address this we have developed a method which identifies over- and under-represented substitution patterns and assesses their overall impact on the evolution of genome composition. Our method is designed to account for biases at smaller pattern sizes, removing their effects. We used this method to investigate context bias in the human lineage after the divergence from chimpanzee. We examined bias effects in substitution patterns between 2 and 5 bp long and found significant effects at all sizes. This included some individual three and four base pair patterns with relatively large biases. We also found that bias effects vary across the genome, differing between transposons and non-transposons, between different classes of transposons, and also near and far from genes. Conclusions: We found that nucleotides beyond the immediately adjacent one are responsible for substantial context effects, and that these biases vary across the genome

    iPSCORE: A Resource of 222 iPSC Lines Enabling Functional Characterization of Genetic Variation across a Variety of Cell Types.

    Get PDF
    Large-scale collections of induced pluripotent stem cells (iPSCs) could serve as powerful model systems for examining how genetic variation affects biology and disease. Here we describe the iPSCORE resource: a collection of systematically derived and characterized iPSC lines from 222 ethnically diverse individuals that allows for both familial and association-based genetic studies. iPSCORE lines are pluripotent with high genomic integrity (no or low numbers of somatic copy-number variants) as determined using high-throughput RNA-sequencing and genotyping arrays, respectively. Using iPSCs from a family of individuals, we show that iPSC-derived cardiomyocytes demonstrate gene expression patterns that cluster by genetic background, and can be used to examine variants associated with physiological and disease phenotypes. The iPSCORE collection contains representative individuals for risk and non-risk alleles for 95% of SNPs associated with human phenotypes through genome-wide association studies. Our study demonstrates the utility of iPSCORE for examining how genetic variants influence molecular and physiological traits in iPSCs and derived cell lines

    Lessons from non-canonical splicing

    Get PDF
    Recent improvements in experimental and computational techniques that are used to study the transcriptome have enabled an unprecedented view of RNA processing, revealing many previously unknown non-canonical splicing events. This includes cryptic events located far from the currently annotated exons and unconventional splicing mechanisms that have important roles in regulating gene expression. These non-canonical splicing events are a major source of newly emerging transcripts during evolution, especially when they involve sequences derived from transposable elements. They are therefore under precise regulation and quality control, which minimizes their potential to disrupt gene expression. We explain how non-canonical splicing can lead to aberrant transcripts that cause many diseases, and also how it can be exploited for new therapeutic strategies

    Genetic regulation of RNA splicing and expression in cancer and stem cells

    No full text
    A central question in genetics is how different classes of DNA variants affect RNA splicing and expression. While there has been substantial progress in associating single nucleotide polymorphisms and small indels with these phenotypes, only recently has affordable high throughput sequencing provided the opportunity to assess the impact of somatic, rare, and copy number variants (CNVs) on RNA splicing and expression. In this thesis, I use high throughput sequencing to investigate the effect of somatic variants in SF3B1 on RNA splicing and characterize the genetic regulation of gene expression in induced pluripotent stem cells (iPSCs). In the first part, I examine the effect of recurrent somatic mutations in the splicing factor SF3B1 on RNA splicing in three different cancer types and find that SF3B1 mutants use hundreds of cryptic 3’ splice sites that are rarely used in samples without SF3B1 mutations. Sequence properties of these cryptic 3’ splice sites suggest altered sterics may allow usage of cryptic 3’ splice sites in SF3B1 mutants. I also identify several candidate genes with out-of-frame cryptic splice sites that are used in a majority of transcripts in the mutants and may contribute to oncogenesis. In the second part, I examine the genetic regulation of gene expression in a collection of 215 human iPSCs using transcriptome and whole genome sequencing. I identify expression quantitative trait loci (eQTLs) for nearly six thousand genes including markers of pluripotency such as POU5F1, LCK, IDO1, and CXCL5. A comparison to GTEx eQTLs reveals that iPSCs are well powered statistically for finding eQTLs and have a unique regulatory landscape. I identify biallelic and multiallelic CNVs eQTLs and find that a substantial proportion of CNV eQTLs appear to affect intergenic regulatory regions. I also find that rare promoter variants weakly disrupt gene expression while rare CNVs that overlap genes tend to disrupt gene expression with relatively high effect sizes. Overall, this thesis helps define the roles of somatic, rare, and copy number variants in the regulation of gene expression and splicing and provide key insights into SF3B1-mutated cancers and iPSCs as a model system for molecular association analyses

    deboever-sf3b1-2014

    No full text
    <p>Files for replicating SF3B1 study.</p
    corecore